RGB热点对象检测(SOD)结合了两个光谱,以分段图像中的视觉明显区域。大多数现有方法都使用边界图来学习锋利的边界。这些方法忽略了孤立的边界像素与其他自信像素之间的相互作用,从而导致了次优性能。为了解决这个问题,我们为基于SWIN Transformer的RGB-T SOD提出了一个职位感知关系学习网络(PRLNET)。 PRLNET探索像素之间的距离和方向关系,以增强阶层内的紧凑性和类间的分离,从而产生具有清晰边界和均匀区域的显着对象掩模。具体而言,我们开发了一个新颖的签名距离辅助模块(SDMAM)来改善编码器特征表示,该模块考虑了边界邻域中不同像素的距离关系。然后,我们使用定向字段(FRDF)设计一种功能改进方法,该方法通过利用明显对象内部的功能来纠正边界邻域的特征。 FRDF利用对象像素之间的方向信息有效地增强了显着区域的阶层紧凑性。此外,我们构成了一个纯变压器编码器 - 模块网络,以增强RGB-T SOD的多光谱特征表示。最后,我们对三个公共基准数据集进行了定量和定性实验。结果表明,我们所提出的方法的表现优于最新方法。
translated by 谷歌翻译
在小组活动识别中,层次结构框架被广泛采用以表示个人及其相应小组之间的关系,并实现了有希望的绩效。但是,现有方法在此框架中仅采用了最大/平均池,这忽略了不同个体对小组活动识别的不同贡献。在本文中,我们提出了一种新的上下文合并方案,名为Ascentive Pooling,该方案可以从个人动作到小组活动的加权信息过渡。通过利用注意机制,细心的合并是可解释的,并且能够将成员环境嵌入现有的层次模型中。为了验证拟议方案的有效性,设计了两种特定的专注合并方法,即全球细心合并(GAP)和分层的细心池(HAP)。差距奖励对小组活动意义重大的个体,而HAP通过引入亚组结构进一步考虑了层次结构。基准数据集上的实验结果表明,我们的建议在基线之外取得了显着优势,并且与最先进的方法相当。
translated by 谷歌翻译
尽管不变风险最小化(IRM)成功解决了分布式概括问题,但在实践中应用时,IRM仍可以损害最佳性。 IRM的实用变体,例如IRMV1,已被证明与IRM存在显着差距,因此即使在简单的问题中也可能无法捕获不变性。此外,IRMV1中的优化过程涉及两个内在冲突的目标,并且通常需要对客观权重进行仔细的调整。为了纠正上述问题,我们将IRM重新制定为多目标优化问题,并为IRM提出了一种新的优化方案,称为Pareto不变风险最小化(Pair)。对可以在客观冲突下适应优化指导。此外,我们表明对可以赋予实用的IRM变体能够在提供适当的指导时用原始IRM克服障碍。我们对ColoredMnist进行实验,以确认我们的理论和对的有效性。
translated by 谷歌翻译
尽管最近在欧几里得数据(例如图像)上使用不变性原理(OOD)概括(例如图像),但有关图数据的研究仍然受到限制。与图像不同,图形的复杂性质给采用不变性原理带来了独特的挑战。特别是,图表上的分布变化可以以多种形式出现,例如属性和结构,因此很难识别不变性。此外,在欧几里得数据上通常需要的域或环境分区通常需要的图形可能非常昂贵。为了弥合这一差距,我们提出了一个新的框架,以捕获图形的不变性,以在各种分配变化下进行保证的OOD概括。具体而言,我们表征了具有因果模型的图形上的潜在分布变化,得出结论,当模型仅关注包含有关标签原因最多信息的子图时,可以实现图形上的OOD概括。因此,我们提出了一个信息理论目标,以提取最大地保留不变的阶级信息的所需子图。用这些子图学习不受分配变化的影响。对合成和现实世界数据集进行的广泛实验,包括在AI ADED药物发现中充满挑战的环境,验证了我们方法的上等OOD概括能力。
translated by 谷歌翻译
Driven by improved architectures and better representation learning frameworks, the field of visual recognition has enjoyed rapid modernization and performance boost in the early 2020s. For example, modern ConvNets, represented by ConvNeXt, have demonstrated strong performance in various scenarios. While these models were originally designed for supervised learning with ImageNet labels, they can also potentially benefit from self-supervised learning techniques such as masked autoencoders (MAE). However, we found that simply combining these two approaches leads to subpar performance. In this paper, we propose a fully convolutional masked autoencoder framework and a new Global Response Normalization (GRN) layer that can be added to the ConvNeXt architecture to enhance inter-channel feature competition. This co-design of self-supervised learning techniques and architectural improvement results in a new model family called ConvNeXt V2, which significantly improves the performance of pure ConvNets on various recognition benchmarks, including ImageNet classification, COCO detection, and ADE20K segmentation. We also provide pre-trained ConvNeXt V2 models of various sizes, ranging from an efficient 3.7M-parameter Atto model with 76.7% top-1 accuracy on ImageNet, to a 650M Huge model that achieves a state-of-the-art 88.9% accuracy using only public training data.
translated by 谷歌翻译
A step-search sequential quadratic programming method is proposed for solving nonlinear equality constrained stochastic optimization problems. It is assumed that constraint function values and derivatives are available, but only stochastic approximations of the objective function and its associated derivatives can be computed via inexact probabilistic zeroth- and first-order oracles. Under reasonable assumptions, a high-probability bound on the iteration complexity of the algorithm to approximate first-order stationarity is derived. Numerical results on standard nonlinear optimization test problems illustrate the advantages and limitations of our proposed method.
translated by 谷歌翻译
Masked image modeling (MIM) has shown great promise for self-supervised learning (SSL) yet been criticized for learning inefficiency. We believe the insufficient utilization of training signals should be responsible. To alleviate this issue, we introduce a conceptually simple yet learning-efficient MIM training scheme, termed Disjoint Masking with Joint Distillation (DMJD). For disjoint masking (DM), we sequentially sample multiple masked views per image in a mini-batch with the disjoint regulation to raise the usage of tokens for reconstruction in each image while keeping the masking rate of each view. For joint distillation (JD), we adopt a dual branch architecture to respectively predict invisible (masked) and visible (unmasked) tokens with superior learning targets. Rooting in orthogonal perspectives for training efficiency improvement, DM and JD cooperatively accelerate the training convergence yet not sacrificing the model generalization ability. Concretely, DM can train ViT with half of the effective training epochs (3.7 times less time-consuming) to report competitive performance. With JD, our DMJD clearly improves the linear probing classification accuracy over ConvMAE by 5.8%. On fine-grained downstream tasks like semantic segmentation, object detection, etc., our DMJD also presents superior generalization compared with state-of-the-art SSL methods. The code and model will be made public at https://github.com/mx-mark/DMJD.
translated by 谷歌翻译
Considering the computation complexity, we propose a Guided Hybrid Quantization with One-to-one Self-Teaching (GHOST}) framework. More concretely, we first design a structure called guided quantization self-distillation (GQSD), which is an innovative idea for realizing lightweight through the synergy of quantization and distillation. The training process of the quantization model is guided by its full-precision model, which is time-saving and cost-saving without preparing a huge pre-trained model in advance. Second, we put forward a hybrid quantization (HQ) module to obtain the optimal bit width automatically under a constrained condition where a threshold for distribution distance between the center and samples is applied in the weight value search space. Third, in order to improve information transformation, we propose a one-to-one self-teaching (OST) module to give the student network a ability of self-judgment. A switch control machine (SCM) builds a bridge between the student network and teacher network in the same location to help the teacher to reduce wrong guidance and impart vital knowledge to the student. This distillation method allows a model to learn from itself and gain substantial improvement without any additional supervision. Extensive experiments on a multimodal dataset (VEDAI) and single-modality datasets (DOTA, NWPU, and DIOR) show that object detection based on GHOST outperforms the existing detectors. The tiny parameters (<9.7 MB) and Bit-Operations (BOPs) (<2158 G) compared with any remote sensing-based, lightweight or distillation-based algorithms demonstrate the superiority in the lightweight design domain. Our code and model will be released at https://github.com/icey-zhang/GHOST.
translated by 谷歌翻译
Automatic font generation without human experts is a practical and significant problem, especially for some languages that consist of a large number of characters. Existing methods for font generation are often in supervised learning. They require a large number of paired data, which are labor-intensive and expensive to collect. In contrast, common unsupervised image-to-image translation methods are not applicable to font generation, as they often define style as the set of textures and colors. In this work, we propose a robust deformable generative network for unsupervised font generation (abbreviated as DGFont++). We introduce a feature deformation skip connection (FDSC) to learn local patterns and geometric transformations between fonts. The FDSC predicts pairs of displacement maps and employs the predicted maps to apply deformable convolution to the low-level content feature maps. The outputs of FDSC are fed into a mixer to generate final results. Moreover, we introduce contrastive self-supervised learning to learn a robust style representation for fonts by understanding the similarity and dissimilarities of fonts. To distinguish different styles, we train our model with a multi-task discriminator, which ensures that each style can be discriminated independently. In addition to adversarial loss, another two reconstruction losses are adopted to constrain the domain-invariant characteristics between generated images and content images. Taking advantage of FDSC and the adopted loss functions, our model is able to maintain spatial information and generates high-quality character images in an unsupervised manner. Experiments demonstrate that our model is able to generate character images of higher quality than state-of-the-art methods.
translated by 谷歌翻译
Gaze estimation is the fundamental basis for many visual tasks. Yet, the high cost of acquiring gaze datasets with 3D annotations hinders the optimization and application of gaze estimation models. In this work, we propose a novel Head-Eye redirection parametric model based on Neural Radiance Field, which allows dense gaze data generation with view consistency and accurate gaze direction. Moreover, our head-eye redirection parametric model can decouple the face and eyes for separate neural rendering, so it can achieve the purpose of separately controlling the attributes of the face, identity, illumination, and eye gaze direction. Thus diverse 3D-aware gaze datasets could be obtained by manipulating the latent code belonging to different face attributions in an unsupervised manner. Extensive experiments on several benchmarks demonstrate the effectiveness of our method in domain generalization and domain adaptation for gaze estimation tasks.
translated by 谷歌翻译